Rewritting Data Cleaning Operations Defined at a Conceptual Level
نویسندگان
چکیده
The handling of increasing amounts of data creates the need to deal with redundant and/or complementary repositories which are disparate in their data models and/or their data structures. Current data cleaning techniques developed to tackle data quality problems are just suitable for scenarios were all repositories share the same model and structure. Recently, a novel methodology was proposed to overcome this limitation. In the context of this methodology, it is outlined a generic process aimed to rewrite Data Cleaning Operation (DCO) specified at a conceptual level for a given domain ontology, and whose structure and semantics are described by the E-DQM ontology, according to another data structure of a target repository and another vocabulary required by an existing Data Cleaning Tool. The proposed algorithm includes validation of the rewritten DCOs and its serialization to the required output format.
منابع مشابه
An Ontology-Based Approach for Data Cleaning
There is no magic solution for data cleaning. The user has always to specify the cleaning operations to perform. A huge number of operations may have to be specified. Yet, this is the condition to detect and correct the data quality problems successfully. Most of the cleaning operations are generic enough to be applied to different databases. These operations may be limited to databases of the ...
متن کاملA UML Based Approach for Modeling ETL Processes in Data Warehouses
Data warehouses (DWs) are complex computer systems whose main goal is to facilitate the decision making process of knowledge workers. ETL (Extraction-Transformation-Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into DWs. ETL processes are a key componen...
متن کاملWisteria: Nurturing Scalable Data Cleaning Infrastructure
Analysts report spending upwards of 80% of their time on problems in data cleaning. The data cleaning process is inherently iterative, with evolving cleaning workflows that start with basic exploratory data analysis on small samples of dirty data, then refine analysis with more sophisticated/expensive cleaning operators (e.g., crowdsourcing), and finally apply the insights to a full dataset. Wh...
متن کاملA Study on Different Mining Operations on RFID Dataset
When the raw data is collected from different users and the sources and managed in a centralized warehouse. Then major problem is to identify the most co-related attributes and information from different sources and maintain them at single source. The work also includes to discard the missing value records and to identify the irrelevant attributes from the different data sources. The presented ...
متن کاملDelineating Operations for Visualization and Analysis of Space-time Data in Gis
A foundation upon which both database operations and visualization operations can be defined in a unified way is needed to provide a more versatile and human-oriented means for interfacing with GIS. This paper proposes a taxonomy of operations using a set-based information model that is application and data model independent. Such operations integrate both GIS functionality and visualization ta...
متن کامل